Skip to content

Add use_real parameter to Z-Image for platform compatibility#13824

Open
st7109 wants to merge 1 commit into
huggingface:mainfrom
st7109:optimize-z-image
Open

Add use_real parameter to Z-Image for platform compatibility#13824
st7109 wants to merge 1 commit into
huggingface:mainfrom
st7109:optimize-z-image

Conversation

@st7109
Copy link
Copy Markdown

@st7109 st7109 commented May 28, 2026

Fixes # (issue)

Add optional real-number RoPE implementation to Z-Image transformer and controlnet. When use_real=True,
the rotary position embeddings use (cos, sin) tuples instead of complex numbers, enabling the model to run on platforms that don't support complex arithmetic (e.g., Cambricon MLU, etc).

Changes:

  • Add apply_rotary_emb() with use_real parameter supporting both complex and real computation
  • Propagate use_real through ZSingleStreamAttnProcessor, ZImageTransformerBlock, RopeEmbedder, ZImageTransformer2DModel, and controlnet variants
  • Update _prepare_sequence and _build_unified_sequence to handle (cos, sin) tuples
  • Default use_real=False maintains backward compatibility

Tested on Cambricon MLU and nvidia A100: successfully generates 1024x1024 images with numerical equivalence (max diff < 1e-6) compared to complex mode.
The test case is from https://huggingface.co/Tongyi-MAI/Z-Image-Turbo, if tests with Cambricon MLU platform, should set use_real=True, then generate the below:

z_image_mlu_real_rope

What does this PR do?

Before submitting

Who can review?

@sayakpaul

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@github-actions github-actions Bot added models size/L PR with diff > 200 LOC labels May 28, 2026
Add optional real-number RoPE implementation to Z-Image transformer and
controlnet. When use_real=True,
the rotary position embeddings use (cos, sin) tuples instead of complex
numbers, enabling the model to run on platforms that don't support complex
arithmetic (e.g., MLU).

Changes:
- Add apply_rotary_emb() with use_real parameter supporting both complex
  and real computation
- Propagate use_real through ZSingleStreamAttnProcessor, ZImageTransformerBlock,
  RopeEmbedder, ZImageTransformer2DModel, and controlnet variants
- Update _prepare_sequence and _build_unified_sequence to handle (cos, sin)
  tuples
- Default use_real=False maintains backward compatibility
- Replace hardcoded cuda autocast with device-aware torch.autocast for Z-Image

Tested on MLU: successfully generates 1024x1024 images with
numerical equivalence (max diff < 1e-6) compared to complex mode.
@st7109 st7109 force-pushed the optimize-z-image branch from 3f13d27 to 4ecd72f Compare June 2, 2026 04:07
@st7109
Copy link
Copy Markdown
Author

st7109 commented Jun 2, 2026

@sayakpaul hello, Sir, Please help review this commit, any suggests, let me know. thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models size/L PR with diff > 200 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant